{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-CjOraVLBBeV"
      },
      "source": [
        "# MCP Evaluation\n",
        "\n",
        "## Context\n",
        "\n",
        "For the MCP server tool calls evaluation, there are several objectives:\n",
        "\n",
        "1. Help to identify problems in the description of the tools\n",
        "2. Create a test suite that can be run manually or automatically in a CI\n",
        "3. Allow for quick iteration on the tool descriptions\n",
        "\n",
        "We considered several options to evaluate the MCP server tool calls.\n",
        "\n",
        "1. ✍️ **Create test cases manually**\n",
        "\n",
        "- **Pros:**\n",
        "  - Straightforward approach\n",
        "  - Simple to create test cases for each tool\n",
        "\n",
        "- **Cons:**\n",
        "  - A bit complicated to create flows (several tool calls in a row)\n",
        "  - Needs to be maintained (every change in the MCP server will require a change in the test cases). But this is true for any other approach.\n",
        "  - Maintenance can be simplified using LLM that collects the data automatically.\n",
        "\n",
        "2. 📊 **Collect traces using MCP tester client**\n",
        "\n",
        "- **Pros:**\n",
        "  - Testing exactly in the same way as the end users do\n",
        "  - Collects the data automatically\n",
        "  - Easily scalable to many users\n",
        "\n",
        "- **Cons:**\n",
        "  - It might be impossible to get to a wrong flow of tool calls\n",
        "  - So far, handling complicated flows is not straightforward, and one needs to drag all the data, including tool calls\n",
        "  - It can be done using sessions, similarly to how chatbots are tested, but this brings additional complexity. It might well happen that the evaluator gets confused by the session and will not be able to evaluate the tool calls correctly.\n",
        "\n",
        "\n",
        "## Create test cases manually\n",
        "\n",
        "Simple example:\n",
        "```\n",
        "\"What are the best Instagram scrapers\": \"search-actors\"\n",
        "```\n",
        "\n",
        "Flow:\n",
        "```\n",
        "- user: Search for the weather MCP server and then add it into the available tools\n",
        "- assistant: I'll help you to do that\n",
        "- tool_use: search-actors, \"input\": {\"search\": \"weather mcp\",\"limit\": 5}\n",
        "- tool_use_id: 12, content: Tool \\\"search-actors\\\" successful, Actor found: jiri.spilka/wheather-mcp-server\n",
        "- assistant:\n",
        "```\n",
        "\n",
        "Expected tool call: `add-actor`\n",
        "\n",
        "## Evaluation\n",
        "\n",
        "Follow the Phoenix evaluation process:\n",
        "\n",
        "1. **Create the dataset**\n",
        "2. **Define the system prompt and tool definitions**\n",
        "3. **Set up the evaluator**\n",
        "4. **Run the experiment**\n",
        "5. **Iterate and refine**\n",
        "\n",
        "For evaluation, we can either specify ground truth for the tool calls or leverage LLM as a judge. Since we are manually creating test cases, we can directly specify the expected tool calls. However, this does not exclude the possibility of using LLM as a judge at a later stage.\n",
        "\n",
        "## Links\n",
        "\n",
        "- [Tutorial on how to use evals](https://colab.research.google.com/github/Arize-ai/phoenix/blob/main/tutorials/evals/evaluate_agent.ipynb#scrollTo=ANh3q56OojLA).\n",
        "- [System prompts for vscode, cursor](https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools)\n",
        "- [Claude Desktop system prompt](https://github.com/asgeirtj/system_prompts_leaks/blob/main/Anthropic/claude.txt)\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qNcScNkfArkU"
      },
      "source": [
        "### Environment setup and project setup\n",
        "\n",
        "You should already have your OpenAI API key.\n",
        "You can find the Phoenix API key in 1Password, under the \"shared\" space."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "lsndh7fH-DSk",
        "outputId": "94430cd2-34fb-4157-86e5-ccb96c4336b5"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Note: you may need to restart the kernel to use updated packages.\n"
          ]
        }
      ],
      "source": [
        "%pip install \"arize-phoenix==12.5.0\" anthropic openai tqdm pandas dotenv --quiet"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "MX-NX_XX-DSm"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/home/jirka/apify/apify-mcp-server/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
            "  from .autonotebook import tqdm as notebook_tqdm\n"
          ]
        }
      ],
      "source": [
        "# If imports fails in first run, run it again\n",
        "\n",
        "import nest_asyncio\n",
        "\n",
        "import json\n",
        "from phoenix import Client as PhoenixClient\n",
        "from phoenix.evals import TOOL_CALLING_PROMPT_TEMPLATE\n",
        "from phoenix.evals.classify import llm_classify\n",
        "from phoenix.evals.models import OpenAIModel\n",
        "from phoenix.experiments import evaluate_experiment, run_experiment\n",
        "from phoenix.experiments.evaluators import create_evaluator\n",
        "from phoenix.experiments.types import Example\n",
        "from phoenix.trace import SpanEvaluations\n",
        "from phoenix.trace.dsl import SpanQuery\n",
        "from openai import OpenAI\n",
        "from anthropic import Anthropic\n",
        "\n",
        "import pandas as pd\n",
        "\n",
        "nest_asyncio.apply()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CM1MjynLAclh",
        "outputId": "c9c81dac-227a-4cdd-d3d9-5dca21b6e711"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from getpass import getpass\n",
        "\n",
        "import dotenv\n",
        "dotenv.load_dotenv()\n",
        "\n",
        "\n",
        "model_name=\"gpt-4o-mini\"\n",
        "#model_name=\"gpt-4.1-nano\"\n",
        "#model_name=\"gpt-4.1-mini\"\n",
        "# model_name=\"gpt-4.1\"\n",
        "# model_name=\"gpt-5-nano\"\n",
        "# model_name=\"gpt-5-mini\"\n",
        "#model_name=\"gpt-5\"\n",
        "# model_name=\"claude-3-5-haiku-latest\"\n",
        "# model_name=\"claude-sonnet-4-20250514\"\n",
        "\n",
        "project_name = \"mcp-client\"\n",
        "endpoint = \"https://app.phoenix.arize.com/s/apify\"\n",
        "\n",
        "# Check if env vars exist, only prompt if missing (Phoenix API key is in 1pass)\n",
        "if not os.environ.get(\"PHOENIX_API_KEY\"):\n",
        "    os.environ[\"PHOENIX_API_KEY\"] = getpass(\"Enter YOUR PHOENIX_API_KEY\")\n",
        "\n",
        "if not os.environ.get(\"OPENAI_API_KEY\"):\n",
        "    os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter YOUR OPENAI_API_KEY\")\n",
        "\n",
        "os.environ[\"PHOENIX_COLLECTOR_ENDPOINT\"] = endpoint\n",
        "os.environ[\"PHOENIX_CLIENT_HEADERS\"] = f\"api_key={os.getenv('PHOENIX_API_KEY')}\"\n",
        "\n",
        "# if not os.environ.get(\"ANTHROPIC_API_KEY\"):\n",
        "    # os.environ[\"ANTHROPIC_API_KEY\"] = getpass(\"Enter YOUR ANTHROPIC_API_KEY\")\n",
        "\n",
        "if not os.environ.get(\"OPENROUTER_API_KEY\"):\n",
        "    os.environ[\"OPENROUTER_API_KEY\"] = getpass(\"Enter YOUR OPENROUTER_API_KEY\")\n",
        "\n",
        "px_client = PhoenixClient(endpoint=endpoint)\n",
        "eval_model = OpenAIModel(model=model_name)\n",
        "\n",
        "openai_client = OpenAI()\n",
        "anthropic_client = Anthropic(timeout=10)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UEvUJPiCRp1a"
      },
      "source": [
        "# Evaluation using manual test cases with ground truth\n",
        "\n",
        "Follow a standard step-by-step process in Phoenix:\n",
        "\n",
        "1. Define system prompt and tool definition\n",
        "2. Create a dataset of test cases, and optionally, expected outputs\n",
        "3. Create a task to run on each test case\n",
        "4. Create evaluator(s) to run on each output of your task\n",
        "5. Visualize results in Phoenix"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IfbfQHC2jFca"
      },
      "source": [
        "### System prompt"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "id": "A4hm-Fgq2Eh8"
      },
      "outputs": [],
      "source": [
        "SYSTEM_PROMPT_SIMPLE = \"You are a helpful assistant\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_HutfwbdSbAh"
      },
      "source": [
        "### TOOLS (17.9.2025)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "ThCMTeO-Sfj_"
      },
      "outputs": [],
      "source": [
        "TOOLS = [\n",
        "    {\n",
        "        \"name\": \"fetch-actor-details\",\n",
        "        \"description\": \"Get detailed information about an Actor by its ID or full name (format: \\\"username/name\\\", e.g., \\\"apify/rag-web-browser\\\").\\nThis returns the Actor’s title, description, URL, README (documentation), input schema, pricing/usage information, and basic stats.\\nPresent the information in a user-friendly Actor card.\\n\\nUSAGE:\\n- Use when a user asks about an Actor’s details, input schema, README, or how to use it.\\n\\nEXAMPLES:\\n- user_input: How to use apify/rag-web-browser\\n- user_input: What is the input schema for apify/rag-web-browser?\\n- user_input: What is the pricing for apify/instagram-scraper?\",\n",
        "        \"inputSchema\": {\n",
        "            \"type\": \"object\",\n",
        "            \"properties\": {\n",
        "                \"actor\": {\n",
        "                    \"type\": \"string\",\n",
        "                    \"minLength\": 1,\n",
        "                    \"description\": \"Actor ID or full name in the format \\\"username/name\\\", e.g., \\\"apify/rag-web-browser\\\".\"\n",
        "                }\n",
        "            },\n",
        "            \"required\": [\n",
        "                \"actor\"\n",
        "            ],\n",
        "            \"additionalProperties\": False,\n",
        "            \"$schema\": \"http://json-schema.org/draft-07/schema#\"\n",
        "        }\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"search-actors\",\n",
        "        \"description\": \"Search the Apify Store for Actors or Model Context Protocol (MCP) servers using keywords.\\nApify Store features solutions for web scraping, automation, and AI agents (e.g., Instagram, TikTok, LinkedIn, flights, bookings).\\n\\nThe results will include curated Actor cards with title, description, pricing model, usage statistics, and ratings.\\nFor best results, use simple space-separated keywords (e.g., \\\"instagram posts\\\", \\\"twitter profile\\\", \\\"playwright mcp\\\").\\nFor detailed information about a specific Actor, use the fetch-actor-details tool.\\n\\nUSAGE:\\n- Use when you need to discover Actors for a specific task or find MCP servers.\\n- Use to explore available tools in the Apify ecosystem based on keywords.\\n\\nEXAMPLES:\\n- user_input: Find Actors for scraping e-commerce\\n- user_input: Find browserbase MCP server\\n- user_input: I need to scrape instagram profiles and comments\\n- user_input: I need to get flights and airbnb data\",\n",
        "        \"inputSchema\": {\n",
        "            \"type\": \"object\",\n",
        "            \"properties\": {\n",
        "                \"limit\": {\n",
        "                    \"type\": \"integer\",\n",
        "                    \"minimum\": 1,\n",
        "                    \"maximum\": 100,\n",
        "                    \"default\": 10,\n",
        "                    \"description\": \"The maximum number of Actors to return. The default value is 10.\"\n",
        "                },\n",
        "                \"offset\": {\n",
        "                    \"type\": \"integer\",\n",
        "                    \"minimum\": 0,\n",
        "                    \"default\": 0,\n",
        "                    \"description\": \"The number of elements to skip at the start. The default value is 0.\"\n",
        "                },\n",
        "                \"search\": {\n",
        "                    \"type\": \"string\",\n",
        "                    \"default\": \"\",\n",
        "                    \"description\": \"A string to search for in the Actor's title, name, description, username, and readme.\\nUse simple space-separated keywords, such as \\\"web scraping\\\", \\\"data extraction\\\", or \\\"playwright browser mcp\\\".\\nDo not use complex queries, AND/OR operators, or other advanced syntax, as this tool uses full-text search only.\"\n",
        "                },\n",
        "                \"category\": {\n",
        "                    \"type\": \"string\",\n",
        "                    \"default\": \"\",\n",
        "                    \"description\": \"Filter the results by the specified category.\"\n",
        "                }\n",
        "            },\n",
        "            \"additionalProperties\": False,\n",
        "            \"$schema\": \"http://json-schema.org/draft-07/schema#\"\n",
        "        }\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"search-apify-docs\",\n",
        "        \"description\": \"Search Apify documentation using full-text search.\\n    You can use it to find relevant documentation based on keywords.\\n    Apify documentation has information about Apify console, Actors (development\\n    (actor.json, input schema, dataset schema, dockerfile), deployment, builds, runs),\\n    schedules, storages (datasets, key-value store), Proxy, Integrations,\\n    Apify Academy (crawling and webscraping with Crawlee),\\n\\n    The results will include the URL of the documentation page, a fragment identifier (if available),\\n    and a limited piece of content that matches the search query.\\n\\n    Fetch the full content of the document using the fetch-apify-docs tool by providing the URL.\\n\\n    USAGE:\\n    - Use when user asks about Apify documentation, Actor development, Crawlee, or Apify platform.\\n\\n    EXAMPLES:\\n    - query: How to use create Apify Actor?\\n    - query: How to define Actor input schema?\\n    - query: How scrape with Crawlee?\",\n",
        "        \"inputSchema\": {\n",
        "            \"type\": \"object\",\n",
        "            \"properties\": {\n",
        "                \"query\": {\n",
        "                    \"type\": \"string\",\n",
        "                    \"minLength\": 1,\n",
        "                    \"description\": \"Algolia full-text search query to find relevant documentation pages.\\nUse only keywords, do not use full sentences or questions.\\nFor example, \\\"standby actor\\\" will return documentation pages that contain the words \\\"standby\\\" and \\\"actor\\\".\"\n",
        "                },\n",
        "                \"limit\": {\n",
        "                    \"type\": \"number\",\n",
        "                    \"default\": 5,\n",
        "                    \"description\": \"Maximum number of search results to return. Defaults to 5.\\nYou can increase this limit if you need more results, but keep in mind that the search results are limited to the most relevant pages.\"\n",
        "                },\n",
        "                \"offset\": {\n",
        "                    \"type\": \"number\",\n",
        "                    \"default\": 0,\n",
        "                    \"description\": \"Offset for the search results. Defaults to 0.\\nUse this to paginate through the search results. For example, if you want to get the next 5 results, set the offset to 5 and limit to 5.\"\n",
        "                }\n",
        "            },\n",
        "            \"required\": [\n",
        "                \"query\"\n",
        "            ],\n",
        "            \"additionalProperties\": False,\n",
        "            \"$schema\": \"http://json-schema.org/draft-07/schema#\"\n",
        "        }\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"fetch-apify-docs\",\n",
        "        \"description\": \"Fetch the full content of an Apify documentation page by its URL.\\nUse this after finding a relevant page with the search-apify-docs tool.\\n\\nUSAGE:\\n- Use when you need the complete content of a specific docs page for detailed answers.\\n\\nEXAMPLES:\\n- user_input: Fetch https://docs.apify.com/platform/actors/running#builds\\n- user_input: Fetch https://docs.apify.com/academy.\",\n",
        "        \"inputSchema\": {\n",
        "            \"type\": \"object\",\n",
        "            \"properties\": {\n",
        "                \"url\": {\n",
        "                    \"type\": \"string\",\n",
        "                    \"minLength\": 1,\n",
        "                    \"description\": \"URL of the Apify documentation page to fetch. This should be the full URL, including the protocol (e.g., https://docs.apify.com/).\"\n",
        "                }\n",
        "            },\n",
        "            \"required\": [\n",
        "                \"url\"\n",
        "            ],\n",
        "            \"additionalProperties\": False,\n",
        "            \"$schema\": \"http://json-schema.org/draft-07/schema#\"\n",
        "        }\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"call-actor\",\n",
        "        \"description\": \"Call any Actor from the Apify Store using a mandatory two-step workflow.\\nThis ensures you first get the Actor’s input schema and details before executing it safely.\\n\\nThe results of a successful run include a datasetId (Actor output stored as an Apify dataset) and a short preview of items.\\nFetch the full output later using the get-actor-output tool by providing the datasetId.\\n\\nUSAGE:\\n- Use when you need to run an Actor that does not have a dedicated tool.\\n- Do not use if a dedicated tool exists (e.g., apify-slash-rag-web-browser).\\n\\nWORKFLOW:\\n- Step 1 (step=\\\"info\\\", default): Get Actor details and input schema to understand required fields.\\n- Step 2 (step=\\\"call\\\"): Provide valid input per the schema to execute the Actor. A datasetId will be returned in the result.\\n\\nEXAMPLES:\\n- user_input: Show input schema for apify/instagram-scraper (step=\\\"info\\\")\\n- user_input: Run apify/rag-web-browser with query=\\\"scrape apify.com\\\" and outputFormats=[\\\"markdown\\\"] (step=\\\"call\\\")\",\n",
        "        \"inputSchema\": {\n",
        "            \"type\": \"object\",\n",
        "            \"properties\": {\n",
        "                \"actor\": {\n",
        "                    \"type\": \"string\",\n",
        "                    \"description\": \"The name of the Actor to call. For example, \\\"apify/rag-web-browser\\\".\"\n",
        "                },\n",
        "                \"step\": {\n",
        "                    \"type\": \"string\",\n",
        "                    \"enum\": [\"info\", \"call\"],\n",
        "                    \"default\": \"info\",\n",
        "                    \"description\": \"Step to perform: \\\"info\\\" to get Actor details and input schema (required first step), \\\"call\\\" to execute the Actor (only after getting info).\"\n",
        "                },\n",
        "                \"input\": {\n",
        "                    \"type\": \"object\",\n",
        "                    \"description\": \"The input JSON to pass to the Actor. For example, {\\\"query\\\": \\\"apify\\\", \\\"maxResults\\\": 5, \\\"outputFormats\\\": [\\\"markdown\\\"]}. Required only when step is \\\"call\\\".\",\n",
        "                    \"additionalProperties\": True\n",
        "                },\n",
        "                \"callOptions\": {\n",
        "                    \"type\": \"object\",\n",
        "                    \"properties\": {\n",
        "                        \"memory\": {\n",
        "                            \"type\": \"number\",\n",
        "                            \"minimum\": 128,\n",
        "                            \"maximum\": 32768,\n",
        "                            \"description\": \"Memory allocation for the Actor in MB. Must be a power of 2 (e.g., 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768). Minimum: 128 MB, Maximum: 32768 MB (32 GB).\"\n",
        "                        },\n",
        "                        \"timeout\": {\n",
        "                            \"type\": \"number\",\n",
        "                            \"minimum\": 0,\n",
        "                            \"description\": \"Maximum runtime for the Actor in seconds. After this time elapses, the Actor will be automatically terminated. Use 0 for infinite timeout (no time limit). Minimum: 0 seconds (infinite).\"\n",
        "                        }\n",
        "                    },\n",
        "                    \"additionalProperties\": False\n",
        "                }\n",
        "            },\n",
        "            \"required\": [\n",
        "                \"actor\", \"step\"\n",
        "            ],\n",
        "            \"additionalProperties\": False,\n",
        "            \"$schema\": \"http://json-schema.org/draft-07/schema#\"\n",
        "        }\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"apify-slash-rag-web-browser\",\n",
        "        \"description\": \"This tool calls the Actor \\\"apify/rag-web-browser\\\" and retrieves its output results.\\nUse this tool instead of the \\\"call-actor\\\" if user requests this specific Actor.\\nActor description: Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs.This tool provides general web browsing functionality, for specific sites like e-commerce, social media it is always better to search for a specific Actor\",\n",
        "        \"inputSchema\": {\n",
        "            \"title\": \"RAG Web Browser\",\n",
        "            \"type\": \"object\",\n",
        "            \"schemaVersion\": 1,\n",
        "            \"properties\": {\n",
        "                \"query\": {\n",
        "                    \"title\": \"Search term or URL\",\n",
        "                    \"description\": \"**REQUIRED** Enter Google Search keywords or a URL of a specific web page. The keywords might include the [advanced search operators](https://blog.apify.com/how-to-scrape-google-like-a-pro/). Examples:\\n\\n- <code>san francisco weather</code>\\n- <code>https://www.cnn.com</code>\\n- <code>function calling site:openai.com</code>\\nExample values: \\\"web browser for RAG pipelines -site:reddit.com\\\"\",\n",
        "                    \"type\": \"string\",\n",
        "                    \"prefill\": \"web browser for RAG pipelines -site:reddit.com\",\n",
        "                    \"examples\": [\n",
        "                        \"web browser for RAG pipelines -site:reddit.com\"\n",
        "                    ]\n",
        "                },\n",
        "                \"maxResults\": {\n",
        "                    \"title\": \"Maximum results\",\n",
        "                    \"description\": \"The maximum number of top organic Google Search results whose web pages will be extracted. If `query` is a URL, then this field is ignored and the Actor only fetches the specific web page.\\nExample values: 3\",\n",
        "                    \"type\": \"integer\",\n",
        "                    \"default\": 3,\n",
        "                    \"examples\": [\n",
        "                        3\n",
        "                    ]\n",
        "                },\n",
        "                \"outputFormats\": {\n",
        "                    \"title\": \"Output formats\",\n",
        "                    \"description\": \"Select one or more formats to which the target web pages will be extracted and saved in the resulting dataset.\\nExample values: [\\\"markdown\\\"]\",\n",
        "                    \"type\": \"array\",\n",
        "                    \"default\": [\n",
        "                        \"markdown\"\n",
        "                    ],\n",
        "                    \"items\": {\n",
        "                        \"type\": \"string\",\n",
        "                        \"enum\": [\n",
        "                            \"text\",\n",
        "                            \"markdown\",\n",
        "                            \"html\"\n",
        "                        ],\n",
        "                        \"enumTitles\": [\n",
        "                            \"Plain text\",\n",
        "                            \"Markdown\",\n",
        "                            \"HTML\"\n",
        "                        ]\n",
        "                    },\n",
        "                    \"examples\": [\n",
        "                        \"markdown\"\n",
        "                    ]\n",
        "                }\n",
        "            },\n",
        "            \"required\": [\n",
        "                \"query\"\n",
        "            ],\n",
        "            \"$id\": \"https://apify.com/mcp/apify-slash-rag-web-browser/schema.json\"\n",
        "        }\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"get-actor-output\",\n",
        "        \"description\": \"Fetch the dataset of a specific Actor run based on datasetId.\\nYou can also retrieve only specific fields from the output if needed. \\n.USAGE:\\nUse this tool to get Actor dataset outside of the preview, or to access fields from the Actor output dataset schema that are not included in the preview.\\nEXAMPLES:\\n- user_input: Get data of my last Actor run?\\n- user_input: Get number_of_likes from my dataset?\\n\\nNote: This tool is automatically included if the Apify MCP Server is configured with any Actor tools (e.g. `apify-slash-rag-web-browser`) or tools that can interact with Actors (e.g. `call-actor`, `add-actor`).\",\n",
        "        \"inputSchema\": {\n",
        "            \"type\": \"object\",\n",
        "            \"properties\": {\n",
        "                \"datasetId\": {\n",
        "                    \"type\": \"string\",\n",
        "                    \"minLength\": 1,\n",
        "                    \"description\": \"Actor output dataset ID to retrieve from.\"\n",
        "                },\n",
        "                \"fields\": {\n",
        "                    \"type\": \"string\",\n",
        "                    \"description\": \"Comma-separated list of fields to include (supports dot notation like \\\"crawl.statusCode\\\"). For example: \\\"crawl.statusCode,text,metadata\\\"\"\n",
        "                },\n",
        "                \"offset\": {\n",
        "                    \"type\": \"number\",\n",
        "                    \"default\": 0,\n",
        "                    \"description\": \"Number of items to skip (default: 0).\"\n",
        "                },\n",
        "                \"limit\": {\n",
        "                    \"type\": \"number\",\n",
        "                    \"default\": 100,\n",
        "                    \"description\": \"Maximum number of items to return (default: 100).\"\n",
        "                }\n",
        "            },\n",
        "            \"required\": [\n",
        "                \"datasetId\"\n",
        "            ],\n",
        "            \"additionalProperties\": False,\n",
        "            \"$schema\": \"http://json-schema.org/draft-07/schema#\"\n",
        "        }\n",
        "    }\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mLbAf1jukQs5"
      },
      "source": [
        "### Create test cases and upload dataset to Phoenix"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Rs1wEfQPRzYE",
        "outputId": "6b38da8e-1246-437e-bfc8-bc0fddefc52f"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "running experiment evaluations |██████████| 2/2 (100.0%) | ⏳ 03:35<00:00 | 107.56s/it\n",
            "/tmp/ipykernel_462255/3148290094.py:251: DeprecationWarning: Migrate to using client.datasets.create_dataset via arize-phoenix-client\n",
            "  dataset = px_client.upload_dataset(\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "📤 Uploading dataset...\n",
            "💾 Examples uploaded: https://app.phoenix.arize.com/s/apify/datasets/RGF0YXNldDo1MQ==/examples\n",
            "🗄️ Dataset version ID: RGF0YXNldFZlcnNpb246NTE=\n"
          ]
        },
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>id</th>\n",
              "      <th>category</th>\n",
              "      <th>query</th>\n",
              "      <th>expectedTools</th>\n",
              "      <th>context</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>weather-mcp-search-then-call-1</td>\n",
              "      <td>flow</td>\n",
              "      <td>Now, use it to check the weather in Prague, Cz...</td>\n",
              "      <td>[call-actor]</td>\n",
              "      <td>[{'role': 'user', 'content': 'Search for weath...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "                               id category  \\\n",
              "0  weather-mcp-search-then-call-1     flow   \n",
              "\n",
              "                                               query expectedTools  \\\n",
              "0  Now, use it to check the weather in Prague, Cz...  [call-actor]   \n",
              "\n",
              "                                             context  \n",
              "0  [{'role': 'user', 'content': 'Search for weath...  "
            ]
          },
          "execution_count": 23,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import uuid\n",
        "\n",
        "id = str(uuid.uuid4())\n",
        "\n",
        "tool_responses = dict()\n",
        "\n",
        "tool_get_actor_details = {\n",
        "    # Basic actor information requests\n",
        "    \"What are the details of apify/instagram-scraper?\": \"fetch-actor-details\",\n",
        "    \"Give me the documentation for apify/rag-web-browser\": \"fetch-actor-details\",\n",
        "    \"Scrape details of apify/google-search-scraper\": \"fetch-actor-details\",\n",
        "    # Specific actor capabilities\n",
        "    \"What can apify/instagram-scraper do?\": \"fetch-actor-details\",\n",
        "    \"How does apify/rag-web-browser work?\": \"fetch-actor-details\",\n",
        "    \"Tell me about apify/social-media-hashtag-research features\": \"fetch-actor-details\",\n",
        "    # Pricing and usage information\n",
        "    \"How much does apify/instagram-scraper cost?\": \"fetch-actor-details\",\n",
        "    \"What's the pricing model for apify/rag-web-browser?\": \"fetch-actor-details\",\n",
        "    # Input schema and configuration\n",
        "    \"What parameters does apify/instagram-scraper accept?\": \"fetch-actor-details\",\n",
        "    \"Show me the input schema for apify/rag-web-browser\": \"fetch-actor-details\",\n",
        "}\n",
        "\n",
        "# search-actors\n",
        "tool_search_actors = {\n",
        "    # Social media scraping\n",
        "    \"How to search for Instagram posts\": \"search-actors\",\n",
        "    \"What are the best Instagram scrapers?\": \"search-actors\",\n",
        "    \"Find actors for scraping social media\": \"search-actors\",\n",
        "    \"Show me Twitter scraping tools\": \"search-actors\",\n",
        "    \"What actors can scrape TikTok content?\": \"search-actors\",\n",
        "    \"Find Facebook data extraction tools\": \"search-actors\",\n",
        "    \"What actors can be used for scraping social media?\": \"search-actors\",\n",
        "    # General web scraping\n",
        "    \"Show me actors for web scraping\": \"search-actors\",\n",
        "    \"Find actors that can scrape news articles\": \"search-actors\",\n",
        "    \"What tools can extract data from e-commerce sites?\": \"search-actors\",\n",
        "    \"Show me Amazon product scrapers\": \"search-actors\",\n",
        "    \"Show me actors for web scraping\": \"search-actors\",\n",
        "    \"Find actors for data extraction tasks\": \"search-actors\",\n",
        "    \"Search for Playwright browser MCP server\": \"search-actors\",\n",
        "    \"Look for actors that can scrape news articles\": \"search-actors\",\n",
        "    \"Find actors that extract data from e-commerce sites\": \"search-actors\",\n",
        "    \"I need to find solution to scrape details of Amazon products\": \"search-actors\",\n",
        "    \"Fetch posts from Twitter about AI\": \"search-actors\",\n",
        "    \"Get flight information from Skyscanner\": \"search-actors\",\n",
        "    \"Can you find actors to scrape weather data?\": \"search-actors\",\n",
        "}\n",
        "\n",
        "# rag-web browser\n",
        "tool_rag_web_browser = {\n",
        "    \"Search articles about AI from tech blogs\": \"apify-slash-rag-web-browser\",\n",
        "    \"Fetch recent articles about climate change\": \"apify-slash-rag-web-browser\",\n",
        "    \"Get the latest weather forecast for San Francisco\": \"apify-slash-rag-web-browser\",\n",
        "    \"Get data from example.com\": \"apify-slash-rag-web-browser\",\n",
        "    \"Get the latest tech industry news\": \"apify-slash-rag-web-browser\",\n",
        "}\n",
        "\n",
        "# search vs rag-web-browser\n",
        "# we want to use rag-web-browser as general purpose tool, not for everything\n",
        "tool_search_actor_vs_rag_web_browser = {\n",
        "    \"Find posts about AI on Instagram\": \"search-actors\",\n",
        "    \"Scrape Instagram posts about AI\": \"search-actors\",\n",
        "\n",
        "    \"Search for AI articles on tech blogs\": \"apify-slash-rag-web-browser\",\n",
        "    \"Fetch articles about AI from Wired and The Verge\": \"apify-slash-rag-web-browser\",\n",
        "\n",
        "    \"Get the latest weather forecast for New York\": \"apify-slash-rag-web-browser\",\n",
        "    \"Search for weather data scraping tools\": \"search-actors\",\n",
        "\n",
        "    \"Fetch flight details for New York to London\": \"search-actors\",\n",
        "    \"Find actors for flight data extraction\": \"search-actors\",\n",
        "\n",
        "    \"Look for news articles on AI\": \"apify-slash-rag-web-browser\",\n",
        "    \"Fetch AI-related news from CNN and BBC\": \"apify-slash-rag-web-browser\",\n",
        "}\n",
        "\n",
        "# DOCS\n",
        "tool_search_apify_docs = {\n",
        "    \"How to build an Apify Actor\": \"search-apify-docs\",\n",
        "    \"Ho to define Actor input schema, provide examples\": \"search-apify-docs\",\n",
        "    \"How to use Playwright library with Apify\": \"search-apify-docs\",\n",
        "    \"Is there is a documentation for MCP server\": \"search-apify-docs\",\n",
        "    \"How to use Apify Proxy\": \"search-apify-docs\",\n",
        "    \"Web scraping with Crawlee\": \"search-apify-docs\",\n",
        "    \"Apify API integration guide\": \"search-apify-docs\",\n",
        "    \"Error handling in Actors\": \"search-apify-docs\",\n",
        "}\n",
        "\n",
        "tool_call_actor_scenarios = {\n",
        "    # Direct actor calls\n",
        "    \"Run apify/instagram-scraper to scrape #dwaynejohnson\": \"call-actor\",\n",
        "    \"Run apidojo/tweet-scraper to scrape twitter profiles\": \"call-actor\",\n",
        "    \"Call apify/google-search-scraper to find restaurants in London\": \"call-actor\",\n",
        "    \"Run apify/social-media-hashtag-research for #AI\": \"call-actor\",\n",
        "    \"Scrape iPhone15 at Amazon using apify/e-commerce-scraping-tool\": \"call-actor\",\n",
        "    \"Call epctex/weather-scraper for New York\": \"call-actor\",\n",
        "}\n",
        "\n",
        "tool_actor_output_management = {\n",
        "    # get-actor-output: Retrieve output from actor executions\n",
        "    \"Get output from my latest actor with datasetId des32s\": \"get-actor-output\",\n",
        "    \"Retrieve results from dataset abc123\": \"get-actor-output\",\n",
        "    \"Show me the data from my Instagram scraper run with datasetId d23d2, \": \"get-actor-output\",\n",
        "    \"Get the first 50 items from my datasetId abc123\": \"get-actor-output\",\n",
        "    \"Retrieve all results from my web scraper with datasetID abc123\": \"get-actor-output\",\n",
        "}\n",
        "\n",
        "tool_fetch_apify_docs = {\n",
        "    \"Get configuration info from: https://docs.apify.com/platform/integrations/mcp\": \"fetch-apify-docs\",\n",
        "}\n",
        "\n",
        "# RUNS\n",
        "tool_get_actor_run = {\n",
        "    \"What is the status of the latest run of apify/instagram-scraper?\": \"get-actor-run\",\n",
        "    \"Can you fetch the status and datasetId of the run of apify/google-search-scraper?\": \"get-actor-run\",\n",
        "}\n",
        "\n",
        "tool_get_actor_run_list = {\n",
        "    \"Get the Actor that failed\": \"get-actor-run-list\",\n",
        "}\n",
        "\n",
        "tool_get_actor_log = {\n",
        "    \"Retrieve logs for the run of apify/instagram-scraper\": \"get-actor-log\",\n",
        "    \"Show the last 20 lines of logs for apify/google-search-scraper\": \"get-actor-log\",\n",
        "    \"Get the log for the latest run of apify/rag-web-browser\": \"get-actor-log\",\n",
        "}\n",
        "\n",
        "# STORAGE\n",
        "tool_get_dataset_list = {\n",
        "    \"List the datasets\": \"get-dataset-list\",\n",
        "}\n",
        "\n",
        "tool_get_dataset = {\n",
        "    \"Can you provide details for the datasetId: 123?\": \"get-dataset\",\n",
        "}\n",
        "\n",
        "tool_get_dataset_items = {\n",
        "    \"Fetch the first 10 items from the dataset apify/instagram-scraper-dataset\": \"get-dataset-items\",\n",
        "    \"Retrieve the first 5 items from the dataset apify/rag-web-browser-dataset, omitting the 'metadata.timestamp' field\": \"get-dataset-items\",\n",
        "}\n",
        "\n",
        "tool_get_dataset_schema = {\n",
        "    \"Get dataset for datasetId: xyz\": \"get-dataset-schema\",\n",
        "}\n",
        "\n",
        "tool_get_key_value_store = {\n",
        "    \"Get details of the key-value store with id: xyz\": \"get-key-value-store\",\n",
        "}\n",
        "\n",
        "tool_get_key_value_store_keys = {\n",
        "    \"Fetch the first 10 keys for the key-value store id: xyz\": \"get-key-value-store-keys\",\n",
        "}\n",
        "\n",
        "tool_get_key_value_store_record = {\n",
        "    \"Retrieve the record for key 'user-details' in the key-value store id: xyz\": \"get-key-value-store-record\",\n",
        "}\n",
        "\n",
        "tool_get_key_value_store_list = {\n",
        "    \"Show all key-value stores, including unnamed ones\": \"get-key-value-store-list\",\n",
        "}\n",
        "\n",
        "# Adding all new tool responses to the overall tool responses\n",
        "\n",
        "\n",
        "# msg = \"\"\"\n",
        "# - user: Search for the weather MCP server and then add it into the available tools\n",
        "# - assistant: I'll help you to do that\n",
        "# - tool_use: search-actors, \"input\": {\"search\": \"weather mcp\",\"limit\": 5}\n",
        "# - tool_use_id: 12, content: Tool \\\"search-actors\\\" successful, Actor found: jiri.spilka/wheather-mcp-server\n",
        "# - assistant:\n",
        "# \"\"\"\n",
        "\n",
        "# tool_responses = {\n",
        "#     \"Search for the weather MCP server and then call add-actor to it into available tools\": \"search-actors,add-actor\",\n",
        "#     msg: \"add-actor\"\n",
        "# }\n",
        "\n",
        "# CORE\n",
        "#tool_responses |= tool_get_actor_details\n",
        "#tool_responses |= tool_search_actors\n",
        "#tool_responses |= tool_rag_web_browser\n",
        "#tool_responses |= tool_search_actor_vs_rag_web_browser\n",
        "#tool_responses |= tool_actor_output_management\n",
        "#tool_responses |= tool_call_actor_scenarios\n",
        "# DOCS\n",
        "#tool_responses |= tool_search_apify_docs\n",
        "#tool_responses |= tool_fetch_apify_docs\n",
        "# RUNS\n",
        "# tool_responses |= tool_get_actor_run\n",
        "# tool_responses |= tool_get_actor_run_list\n",
        "# tool_responses |= tool_get_actor_log\n",
        "# # STORAGE\n",
        "# tool_responses |= tool_get_dataset\n",
        "# tool_responses |= tool_get_dataset_list\n",
        "# tool_responses |= tool_get_dataset_items\n",
        "# tool_responses |= tool_get_dataset_schema\n",
        "# tool_responses |= tool_get_key_value_store\n",
        "# tool_responses |= tool_get_key_value_store_keys\n",
        "# tool_responses |= tool_get_key_value_store_record\n",
        "# tool_responses |= tool_get_key_value_store_list\n",
        "\n",
        "\n",
        "tool_responses = [\n",
        "    {\n",
        "      \"id\": \"weather-mcp-search-then-call-1\",\n",
        "      \"category\": \"flow\",\n",
        "      \"query\": \"Now, use it to check the weather in Prague, Czechia?\",\n",
        "      \"expectedTools\": [\"call-actor\"],\n",
        "      \"context\": [ \n",
        "        { \"role\": \"user\", \"content\": \"Search for weather MCP server\" },\n",
        "        { \"role\": \"assistant\", \"content\": \"I'll help you to do that\" },\n",
        "        { \"role\": \"tool_use\", \"tool\": \"search-actors\", \"input\": {\"search\": \"weather mcp\", \"limit\": 5} },\n",
        "        { \"role\": \"tool_result\", \"tool_use_id\": 12, \"content\": \"Tool \\\"search-actors\\\" successful, Actor found: jiri.spilka/wheather-mcp-server\" }\n",
        "      ]\n",
        "    }\n",
        "]\n",
        "\n",
        "# ```\n",
        "# - user: Search for the weather MCP server and then add it into the available tools\n",
        "# - assistant: I'll help you to do that\n",
        "#- tool_use: search-actors, \"input\": {\"search\": \"weather mcp\",\"limit\": 5}\n",
        "#- tool_use_id: 12, content: Tool \\\"search-actors\\\" successful, Actor found: jiri.spilka/wheather-mcp-server\n",
        "#- assistant:\n",
        "#```\n",
        "\n",
        "# convert context field to plain text\n",
        "# for tr in tool_responses:\n",
        "#     context = tr.get(\"context\", [])\n",
        "#     context_str = \"\"\n",
        "#     for msg in context:\n",
        "#         role = msg.get(\"role\")\n",
        "#         content = msg.get(\"content\", \"\")\n",
        "#         if role == \"user\":\n",
        "#             context_str += f\"- user: {content}\\n\"\n",
        "#         elif role == \"assistant\":\n",
        "#             context_str += f\"- assistant: {content}\\n\"\n",
        "#         elif role == \"tool_use\":\n",
        "#             tool = msg.get(\"tool\")\n",
        "#             input_data = msg.get(\"input\", {})\n",
        "#             context_str += f\"- tool_use: {tool}, input: {json.dumps(input_data)}\\n\"\n",
        "#         elif role == \"tool_result\":\n",
        "#             tool_use_id = msg.get(\"tool_use_id\")\n",
        "#             content = msg.get(\"content\", \"\")\n",
        "#             context_str += f\"- tool_use_id: {tool_use_id}, content: {content}\\n\"\n",
        "#     tr[\"context\"] = context_str.strip()\n",
        "\n",
        "\n",
        "tool_calling_df = pd.DataFrame.from_records(tool_responses)\n",
        "\n",
        "dataset = px_client.upload_dataset(\n",
        "    dataframe=tool_calling_df,\n",
        "    dataset_name=f\"tool_calling_ground_truth_{id}\",\n",
        "    input_keys=[\"query\", \"context\"],\n",
        "    output_keys=[\"expectedTools\"],\n",
        ")\n",
        "\n",
        "tool_calling_df\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "E7cU90CEzokC"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/tmp/ipykernel_22496/3337211222.py:2: DeprecationWarning: Migrate to using client.datasets.get_dataset via arize-phoenix-client\n",
            "  dataset = px_client.get_dataset(id=dataset_id)\n"
          ]
        }
      ],
      "source": [
        "dataset_id = \"RGF0YXNldDoyOA==\"\n",
        "dataset = px_client.get_dataset(id=dataset_id)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kftvy8CrSpzZ"
      },
      "source": [
        "### Transform tools, define router"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "unIW_J3bSrsZ"
      },
      "outputs": [],
      "source": [
        "def transfrom_tools_to_openai_format(tools):\n",
        "    \"\"\" Transforms the tools to the OpenAI format.\"\"\"\n",
        "    return [\n",
        "        {\n",
        "            \"type\": \"function\",\n",
        "            \"function\": {\n",
        "                \"name\": tool[\"name\"],\n",
        "                \"description\": tool[\"description\"],\n",
        "                \"parameters\": tool[\"inputSchema\"],\n",
        "            },\n",
        "        }\n",
        "        for tool in tools\n",
        "    ]\n",
        "\n",
        "def transfrom_tools_to_antrophic_format(tools):\n",
        "    \"\"\" Transforms the tools to the Antrophic format.\"\"\"\n",
        "    from copy import deepcopy\n",
        "    t = deepcopy(tools)\n",
        "    for tool_ in t:\n",
        "        tool_[\"input_schema\"] = tool_.pop(\"inputSchema\")\n",
        "    return t\n",
        "\n",
        "\n",
        "def run_router_step(example: Example) -> str:\n",
        "    messages = [{\"role\": \"system\",\"content\": SYSTEM_PROMPT_SIMPLE}]\n",
        "    messages.append({\"role\": \"user\", \"content\": example.input.get(\"question\")})\n",
        "\n",
        "    response = openai_client.chat.completions.create(\n",
        "        model=model_name,\n",
        "        messages=messages,\n",
        "        tools=transfrom_tools_to_openai_format(TOOLS),\n",
        "    )\n",
        "    tool_calls = []\n",
        "    print(example.input.get('question'), response.choices[0].message)\n",
        "    if response.choices[0].message.tool_calls:\n",
        "        tool_calls.append(response.choices[0].message.tool_calls[0].function.name)\n",
        "    return tool_calls\n",
        "\n",
        "def run_router_step_antrophic(example: Example) -> str:\n",
        "\n",
        "  response = anthropic_client.messages.create(\n",
        "    model=model_name,\n",
        "    system=SYSTEM_PROMPT_SIMPLE,\n",
        "    messages=[{\"role\": \"user\",\"content\": example.input.get(\"question\")}],\n",
        "    tools=transfrom_tools_to_antrophic_format(TOOLS),\n",
        "    max_tokens=2048,\n",
        "  )\n",
        "\n",
        "  tool_calls = []\n",
        "  print(example.input.get('question'), response.content)\n",
        "  for content in response.content:\n",
        "    if content.type == 'tool_use':\n",
        "      tool_calls.append(content.name)\n",
        "\n",
        "  return tool_calls"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "V-e1zcdcTSkL"
      },
      "source": [
        "## Define evaluator\n",
        "Your evaluator can also be simple, since you have expected outputs. If you didn't have those expected outputs, you could instead use an LLM as a Judge here, or even basic code:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "ZSXRx4iWTTOH"
      },
      "outputs": [],
      "source": [
        "def tools_exact_match(expected: str, output: str) -> bool:\n",
        "    expected_tools = (expected.get('tool_calls') and expected.get('tool_calls').split(', ')) or []\n",
        "    print(f\"Tool output = {output}, expected = {expected_tools}, output==expected = {sorted(expected_tools) == sorted(output)}\")\n",
        "    return sorted(expected_tools) == sorted(output)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Izr6RVZ_U1mY"
      },
      "source": [
        "### Evaluation (multiple models)\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000,
          "referenced_widgets": [
            "4dec52b6f65f4ad997d549dcb652545f",
            "5c196e9311bb4170b8764f94eb766868",
            "4c0fca2991d348c6b81757987d8cf7d2",
            "7b206c4a135d4f6e8035d9121da7c13c",
            "8c49525a022549049edcd2d98468753c",
            "c70d88762a194a7a829a034da6940bfe",
            "2db670aeb3554bd6b1ce20e9b6519d1e",
            "e03222e0e51848efbdd36b04f838c4e7",
            "fbb6ca7b528f42f08325a57f2ba76581",
            "724a78b102c34d0c905da8a468ae2288",
            "68d978576a7c48bdb43c3cb09665ba86",
            "f686798eec194436a7c4966ba7302f14",
            "2e6cb507b3e141349773886202044fff",
            "07e9eb288b694f0abd9c41f714887421",
            "b4331d4ab91c42988f0dd3cdb5891a3d",
            "ade52643f8984b71ba30975ad50f9fd2",
            "c6b285dbe3cc4145997136c06e8dd68a",
            "7bcecd0687964676a248c53514d0430b",
            "ba869a403b65473a8001ce26d80b7c3b",
            "30cc1a2593264ae48efac88e033f8bb2",
            "7ba0ef6eb185420f922975eab019b2d8",
            "a98eb9ccdc1e4dd6bf753790ad870195"
          ]
        },
        "id": "mEU4YcvVU6sg",
        "outputId": "276d27b1-18d9-4849-cae7-575b8735dce2"
      },
      "outputs": [],
      "source": [
        "#SELECTED_MODELS = [\"claude-3-5-haiku-latest\"]\n",
        "\n",
        "SELECTED_MODELS = [\"gpt-4o-mini\"]\n",
        "#SELECTED_MODELS = [\"claude-sonnet-4-5-20250929\"]\n",
        "#SELECTED_MODELS = [ \"gpt-4o-mini\", \"claude-3-5-haiku-latest\"]\n",
        "#SELECTED_MODELS = [ \"gpt-4.1-nano\", \"gpt-4.1-mini\", \"gpt-4.1\"]\n",
        "#SELECTED_MODELS = [ \"gpt-5-nano\", \"gpt-5-mini\", \"gpt-5\", \"claude-sonnet-4-0\"]\n",
        "#SELECTED_MODELS = [\"gpt-4o-mini\", \"gpt-4.1-nano\", \"gpt-4.1-mini\", \"gpt-4.1\", \"gpt-5-nano\", \"gpt-5-mini\", \"gpt-5\", \"claude-3-5-haiku-latest\", \"claude-sonnet-4-0\" ]\n",
        "\n",
        "#for model_name in SELECTED_MODELS:\n",
        "\n",
        "experiment_name = f\"Eval 21 {model_name}\"\n",
        "experiment_description = model_name\n",
        "\n",
        "if model_name.startswith(\"gpt\"):\n",
        "  run_router_step = run_router_step\n",
        "elif model_name.startswith(\"claude\"):\n",
        "  run_router_step = run_router_step_antrophic\n",
        "\n",
        "experiment = run_experiment(\n",
        "    dataset,\n",
        "    run_router_step,\n",
        "    evaluators=[tools_exact_match],\n",
        "    experiment_name=experiment_name,\n",
        "    experiment_description=experiment_description,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "🧪 Experiment started.\n",
            "📺 View dataset experiments: https://app.phoenix.arize.com/s/apify/datasets/RGF0YXNldDo1MQ==/experiments\n",
            "🔗 View this experiment: https://app.phoenix.arize.com/s/apify/datasets/RGF0YXNldDo1MQ==/compare?experimentId=RXhwZXJpbWVudDoxODQ=\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "running tasks |          | 0/1 (0.0%) | ⏳ 00:00<? | ?it/s"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Running example: Example(\n",
            "    id=\"RGF0YXNldEV4YW1wbGU6MTAzMg==\",\n",
            "    \u001b[1m\u001b[94minput\u001b[0m={\n",
            "        \"context\": [\n",
            "            {\n",
            "                \"content\": \"Search for weather MCP server\",\n",
            "                \"input\": null,\n",
            "                \"role\": \"user\",\n",
            "                \"tool\": null,\n",
            "                \"tool_use_id\": null\n",
            "            },\n",
            "            {\n",
            "                \"content\": \"I'll help you to do that\",\n",
            "                \"input\": null,\n",
            "                \"role\": \"assistant\",\n",
            "                \"tool\": null,\n",
            "                \"tool_use_id\": null\n",
            "            },\n",
            "            ...\n",
            "        ],\n",
            "        \"query\": \"Now, use it to check the weather in Prague,...\"\n",
            "    },\n",
            "    \u001b[1m\u001b[94moutput\u001b[0m={\n",
            "        \"expectedTools\": [\n",
            "            \"call-actor\"\n",
            "        ]\n",
            "    },\n",
            ")\n",
            "Messages to model: [{'role': 'system', 'content': 'You are a helpful assistant'}, {'role': 'user', 'content': 'My previous interaction with the assistant: [{\\'role\\': \\'user\\', \\'tool\\': None, \\'input\\': None, \\'content\\': \\'Search for weather MCP server\\', \\'tool_use_id\\': None}, {\\'role\\': \\'assistant\\', \\'tool\\': None, \\'input\\': None, \\'content\\': \"I\\'ll help you to do that\", \\'tool_use_id\\': None}, {\\'role\\': \\'tool_use\\', \\'tool\\': \\'search-actors\\', \\'input\\': {\\'limit\\': 5, \\'search\\': \\'weather mcp\\'}, \\'content\\': None, \\'tool_use_id\\': None}, {\\'role\\': \\'tool_result\\', \\'tool\\': None, \\'input\\': None, \\'content\\': \\'Tool \"search-actors\" successful, Actor found: jiri.spilka/wheather-mcp-server\\', \\'tool_use_id\\': 12.0}]'}, {'role': 'user', 'content': 'User query: Now, use it to check the weather in Prague, Czechia?'}]\n",
            "Model response: Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='call_JfD4WvSGv8R5Mi9XlhWtqd1c', function=Function(arguments='{\"actor\":\"jiri.spilka/wheather-mcp-server\",\"step\":\"info\"}', name='call-actor'), type='function', index=0)], reasoning=None), native_finish_reason='tool_calls')\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "running tasks |██████████| 1/1 (100.0%) | ⏳ 00:02<00:00 |  2.09s/it"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "✅ Task runs completed.\n",
            "🧠 Evaluation started.\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": []
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Evaluating tools match. Expected: {'expectedTools': ['call-actor']}, Output: {'query': 'Now, use it to check the weather in Prague, Czechia?', 'context': [{'role': 'user', 'tool': None, 'input': None, 'content': 'Search for weather MCP server', 'tool_use_id': None}, {'role': 'assistant', 'tool': None, 'input': None, 'content': \"I'll help you to do that\", 'tool_use_id': None}, {'role': 'tool_use', 'tool': 'search-actors', 'input': {'limit': 5, 'search': 'weather mcp'}, 'content': None, 'tool_use_id': None}, {'role': 'tool_result', 'tool': None, 'input': None, 'content': 'Tool \"search-actors\" successful, Actor found: jiri.spilka/wheather-mcp-server', 'tool_use_id': 12.0}], 'reference': '', 'tool_calls': [{'id': 'call_JfD4WvSGv8R5Mi9XlhWtqd1c', 'type': 'function', 'index': 0, 'function': {'name': 'call-actor', 'arguments': '{\"actor\":\"jiri.spilka/wheather-mcp-server\",\"step\":\"info\"}'}}], 'llm_response': ''}\n",
            "Tools match: score=1.0, output=['call-actor'], expected=['call-actor']\n",
            "Evaluating tool calling. Input: {'query': 'Now, use it to check the weather in Prague, Czechia?', 'context': [{'role': 'user', 'tool': None, 'input': None, 'content': 'Search for weather MCP server', 'tool_use_id': None}, {'role': 'assistant', 'tool': None, 'input': None, 'content': \"I'll help you to do that\", 'tool_use_id': None}, {'role': 'tool_use', 'tool': 'search-actors', 'input': {'limit': 5, 'search': 'weather mcp'}, 'content': None, 'tool_use_id': None}, {'role': 'tool_result', 'tool': None, 'input': None, 'content': 'Tool \"search-actors\" successful, Actor found: jiri.spilka/wheather-mcp-server', 'tool_use_id': 12.0}]}, Output: {'query': 'Now, use it to check the weather in Prague, Czechia?', 'context': [{'role': 'user', 'tool': None, 'input': None, 'content': 'Search for weather MCP server', 'tool_use_id': None}, {'role': 'assistant', 'tool': None, 'input': None, 'content': \"I'll help you to do that\", 'tool_use_id': None}, {'role': 'tool_use', 'tool': 'search-actors', 'input': {'limit': 5, 'search': 'weather mcp'}, 'content': None, 'tool_use_id': None}, {'role': 'tool_result', 'tool': None, 'input': None, 'content': 'Tool \"search-actors\" successful, Actor found: jiri.spilka/wheather-mcp-server', 'tool_use_id': 12.0}], 'reference': '', 'tool_calls': [{'id': 'call_JfD4WvSGv8R5Mi9XlhWtqd1c', 'type': 'function', 'index': 0, 'function': {'name': 'call-actor', 'arguments': '{\"actor\":\"jiri.spilka/wheather-mcp-server\",\"step\":\"info\"}'}}], 'llm_response': ''}, Expected: {'expectedTools': ['call-actor']}\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": []
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Tool calling evaluation result: [Score(name='tool_calling', score=1.0, label='correct', explanation=\"The tool 'call-actor' was chosen correctly to execute the Actor 'jiri.spilka/wheather-mcp-server' in order to check the weather in Prague, as it allows for executing the Actor after retrieving its details.\", metadata={'model': 'gpt-4o-mini'}, source='llm', direction='maximize')] (Score: 1.0)\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": []
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "🔗 View this experiment: https://app.phoenix.arize.com/s/apify/datasets/RGF0YXNldDo1MQ==/compare?experimentId=RXhwZXJpbWVudDoxODQ=\n",
            "\n",
            "Experiment Summary (10/16/25 10:01 PM +0200)\n",
            "--------------------------------------------\n",
            "                evaluator  n  n_scores  avg_score\n",
            "0  tool_calling_evaluator  1         1        1.0\n",
            "1       tools_exact_match  1         1        1.0\n",
            "\n",
            "Tasks Summary (10/16/25 10:01 PM +0200)\n",
            "---------------------------------------\n",
            "   n_examples  n_runs  n_errors\n",
            "0           1       1         0\n"
          ]
        }
      ],
      "source": [
        "from phoenix.experiments.evaluators import create_evaluator\n",
        "from typing import Any, Dict\n",
        "from openai.types.chat.chat_completion import ChatCompletionMessage    \n",
        "from phoenix.evals import create_classifier\n",
        "from phoenix.evals.llm import LLM\n",
        "\n",
        "from openai import OpenAI\n",
        "\n",
        "client = OpenAI(\n",
        "  base_url=\"https://openrouter.ai/api/v1\",\n",
        "  api_key=os.getenv(\"OPENROUTER_API_KEY\"),\n",
        ")\n",
        "\n",
        "\n",
        "def tools_exact_match(expected: dict, output: dict) -> tuple[float, str]:\n",
        "    \"\"\" Evaluator to check if the tools called in the output match the expected tools. \n",
        "    \n",
        "    Expected contains expectedTools key with a list of expected tool names.\n",
        "    Output contains tool_calls key with a list of tool call objects, each having a function key with name of the tool called.\n",
        "    \"\"\"\n",
        "\n",
        "    print(f\"Evaluating tools match. Expected: {expected}, Output: {output}\")\n",
        "\n",
        "    expected_tools = expected.get('expectedTools', [])\n",
        "    if isinstance(expected_tools, str):\n",
        "        expected_tools = expected_tools.split(', ')\n",
        "\n",
        "    if not expected_tools:\n",
        "        return 1.0, \"No expected tools provided\"\n",
        "\n",
        "    expected_tools = sorted(expected_tools)\n",
        "\n",
        "    output_tools = []\n",
        "    if output and (tools := output.get('tool_calls')):\n",
        "        output_tools = sorted([tool_call.get('function', {}).get('name', '') for tool_call in tools])\n",
        "\n",
        "    is_correct = expected_tools == output_tools\n",
        "    score = 1.0 if is_correct else 0.0\n",
        "    explanation = f\"Expected: {expected_tools}, Got: {output_tools}\"\n",
        "    \n",
        "    print(f\"Tools match: score={score}, output={output_tools}, expected={expected_tools}\")\n",
        "    return score, explanation\n",
        "\n",
        "TOOL_CALLING_BASE_TEMPLATE = \"\"\"\n",
        "You are an evaluation assistant evaluating user queries and tool calls to\n",
        "determine whether a right tool was chosen.\n",
        "The tool calls have been generated by a separate agent, and chosen from the list of\n",
        "tools provided below. It is your job to decide whether that agent chose\n",
        "the right tool to call.\n",
        "\n",
        "    [BEGIN DATA]\n",
        "    ************\n",
        "    [Previous user interactions]: {context}\n",
        "    [User query]: {query}\n",
        "    ************\n",
        "    [Tool called]: {tool_calls}\n",
        "    [LLM response]: {llm_response}\n",
        "    [END DATA]\n",
        "\n",
        "    \n",
        "DECISION: [correct or incorrect]\n",
        "EXPLANATION: [Super short explanation of why the tool choice was correct or incorrect]\n",
        "\n",
        "Your response must be single word, either \"correct\" or \"incorrect\",\n",
        "and should not contain any text or characters aside from that word.\n",
        "\"incorrect\" means that the chosen tool was not correrly \n",
        "or that the tool signature includes parameter values that don't match\n",
        "the formats specified in the tool signatures below.\n",
        "\n",
        "\"correct\" means the correct tool call was chosen, the correct parameters\n",
        "were extracted from the query, the tool call generated is runnable and correct,\n",
        "and that no outside information not present in the query was used\n",
        "in the generated query.\n",
        "\n",
        "[Reference instructions]: {reference}\n",
        "\n",
        "[Tool Definitions]: {tool_definitions}\n",
        "\"\"\"\n",
        "\n",
        "llm = LLM(provider=\"openai\", model=\"gpt-4o-mini\")\n",
        "\n",
        "# The fields in the prompt template will be filled in by the evaluator, so they must match the router step output keys\n",
        "accuracy_eval = create_classifier(\n",
        "    name=\"tool_calling\",\n",
        "    prompt_template=TOOL_CALLING_BASE_TEMPLATE,\n",
        "    llm=llm,\n",
        "    choices={\"correct\": 1.0, \"incorrect\": 0.0},\n",
        ")\n",
        "\n",
        "@create_evaluator(kind=\"llm\")\n",
        "def tool_calling_evaluator(input: Dict[str, Any], output: Dict[str, Any], expected: Dict[str, Any]) -> dict:\n",
        "    \"\"\"Evaluator using Phoenix classifier - more robust than direct LLM calls.\"\"\"\n",
        "    \n",
        "    print(f\"Evaluating tool calling. Input: {input}, Output: {output}, Expected: {expected}\")\n",
        "    \n",
        "    eval_input = {\n",
        "        \"query\": input.get(\"query\", \"\"),\n",
        "        \"context\": input.get(\"context\", \"\"),\n",
        "        \"tool_calls\": output.get(\"tool_calls\", []),\n",
        "        \"llm_response\": output.get(\"llm_response\", \"\"),\n",
        "        \"reference\": expected.get(\"reference\", \"\"),\n",
        "        \"tool_definitions\": str(TOOLS)\n",
        "    }\n",
        "    \n",
        "    try:\n",
        "        result = accuracy_eval.evaluate(eval_input)\n",
        "        print(f\"Tool calling evaluation result: {result} (Score: {result[0].score})\")\n",
        "        return result[0].score, result[0].explanation\n",
        "    except Exception as e:\n",
        "        print(f\"Evaluation failed: {e}\")\n",
        "        return 0.0, f\"Evaluation failed: {e}\"\n",
        "\n",
        "\n",
        "def run_router_step(example: Example) -> dict:\n",
        "    \"\"\"\n",
        "    Run a single step of the evaluation process. \n",
        "    \n",
        "    Uses the OpenRouter client to call the model with the given example input.\n",
        "    Returns the model's response message.\n",
        "    \"\"\"\n",
        "\n",
        "    print(f\"Running example: {example}\")\n",
        "\n",
        "    context = example.input.get(\"context\",\"\")\n",
        "    query = example.input.get(\"query\",\"\")\n",
        "\n",
        "    messages = [{\"role\": \"system\",\"content\": SYSTEM_PROMPT_SIMPLE}]\n",
        "\n",
        "    if context:\n",
        "        messages.append({\"role\": \"user\",\"content\": f\"My previous interaction with the assistant: {context}\"})\n",
        "\n",
        "    messages.append(\n",
        "        {\n",
        "            \"role\": \"user\",\n",
        "            \"content\": f\"User query: {query if query else ''}\",\n",
        "        }\n",
        "    )\n",
        "\n",
        "    print(f\"Messages to model: {messages}\")\n",
        "\n",
        "    response = client.chat.completions.create(\n",
        "        model=\"openai/gpt-4o-mini\",\n",
        "        # model=\"anthropic/claude-3.5-haiku\",\n",
        "        # model=\"google/gemini-2.5-flash\",\n",
        "        messages=messages,\n",
        "        tools=transfrom_tools_to_openai_format(TOOLS)\n",
        "    )\n",
        "\n",
        "    print(f\"Model response: {response.choices[0]}\")\n",
        "\n",
        "    return{\n",
        "        \"tool_calls\": response.choices[0].message.tool_calls or [],\n",
        "        \"llm_response\": response.choices[0].message.content or \"\",\n",
        "        \"query\": example.input.get(\"query\",\"\"),\n",
        "        \"context\": example.input.get(\"context\",\"\"),\n",
        "        \"reference\": example.output.get(\"reference\",\"\")\n",
        "    }\n",
        "\n",
        "\n",
        "experiment_name = f\"Eval 21\"\n",
        "experiment_description = \"Testing tool calling with OpenRouter\"\n",
        "\n",
        "experiment = run_experiment(\n",
        "    dataset,\n",
        "    run_router_step,\n",
        "    evaluators=[tools_exact_match, tool_calling_evaluator],\n",
        "    experiment_name=experiment_name,\n",
        "    experiment_description=experiment_description,\n",
        ")\n",
        "\n",
        "# r = accuracy_eval.evaluate({\n",
        "#     \"query\": \"What are the details of apify/instagram-scraper?\",\n",
        "#     \"context\": \"\",\n",
        "#     \"llm_response\": \"Tool calls made: ['fetch-actor-details']\",\n",
        "#     \"reference\": \"Called the tool fetch-actor-details with input apify/instagram-scraper\",\n",
        "#     \"tool_definitions\": str(TOOLS)\n",
        "# })\n",
        "\n",
        "# print(f\"Tool calling evaluation result: {r} (Score: {r[0].score})\")\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 269
        },
        "id": "JlUsFjQY-DSm",
        "outputId": "f68aaae1-9084-4d12-f366-9c49be97f320"
      },
      "outputs": [],
      "source": [
        "query = (\n",
        "    SpanQuery()\n",
        "    .where(\n",
        "        \"span_kind == 'AGENT'\",\n",
        "    )\n",
        "    .select(question=\"input.value\", output_messages=\"llm.output_messages\", tool_definitions=\"llm.tools\")\n",
        ")\n",
        "\n",
        "# The Phoenix Client can take this query and return the dataframe.\n",
        "tool_calls_df = px_client.query_spans(query, project_name=project_name, timeout=None)\n",
        "\n",
        "tool_calls_df.dropna(subset=[\"output_messages\"], inplace=True)\n",
        "tool_calls_df\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 269
        },
        "id": "oJc5BJgffoVk",
        "outputId": "ffc30afa-c483-48b0-ffe3-5ab520d4441f"
      },
      "outputs": [],
      "source": [
        "OVERRIDE_TOOLS = False\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "def simulate_tools_output_openai(query):\n",
        "    \"\"\" Run query with system prompt and tools.\"\"\"\n",
        "    response = openai_client.chat.completions.create(\n",
        "        model=\"gpt-4o-mini\",\n",
        "        messages=[\n",
        "            {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
        "            {\"role\": \"user\", \"content\": query}],\n",
        "        tools=transfrom_tools_to_openai_format(TOOLS),\n",
        "        max_tokens=2048,\n",
        "    )\n",
        "    return response.choices[0].message.to_dict()[\"tool_calls\"]\n",
        "\n",
        "\n",
        "def simulate_tools_output_anthropic(prompt):\n",
        "    \"\"\" Run query with system prompt and tools.\"\"\"\n",
        "    message = anthropic_client.messages.create(\n",
        "        model=\"claude-3-5-haiku-latest\",\n",
        "        system=SYSTEM_PROMPT,\n",
        "        messages=[\n",
        "            {\n",
        "                \"role\": \"user\",\n",
        "                \"content\": prompt,\n",
        "            }\n",
        "        ],\n",
        "        tools=TOOLS,\n",
        "        max_tokens=2048,\n",
        "    )\n",
        "    return message.content[-1].to_dict()\n",
        "\n",
        "if OVERRIDE_TOOLS:\n",
        "    # tool_calls_df[\"tool_call\"] = tool_calls_df[\"question\"].progress_apply(simulate_tools_output_openai)\n",
        "    tool_calls_df[\"tool_call\"] = tool_calls_df[\"question\"].progress_apply(simulate_tools_output_anthropic)\n",
        "\n",
        "tool_calls_df"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RYmOjICSfoVj"
      },
      "source": [
        "## Opt: Simulate other tool definitions using user prompts"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "K141mQmECXKO"
      },
      "source": [
        "## Transform data\n",
        "\n",
        "Get tool calls from conversions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1qYRfwYq-DSn"
      },
      "outputs": [],
      "source": [
        "def get_tool_calls(conversation_data):\n",
        "    \"\"\"\n",
        "    Extract function calls from conversation data in the format:\n",
        "    [{tool: \"name_of_tool\", input: {key: value}, output: {...}}, ...]\n",
        "\n",
        "    Args:\n",
        "        conversation_data (str or list): JSON string or parsed conversation data\n",
        "\n",
        "    Returns:\n",
        "        list: Array of function call objects with tool, input, and output\n",
        "    \"\"\"\n",
        "    # Parse JSON if it's a string\n",
        "    if isinstance(conversation_data, str):\n",
        "        data = json.loads(conversation_data)\n",
        "    else:\n",
        "        data = conversation_data\n",
        "\n",
        "    function_calls = []\n",
        "    tool_calls_map = {}  # Map tool IDs to their calls for matching with results\n",
        "\n",
        "    # First pass: collect tool calls\n",
        "    for message in data:\n",
        "        if isinstance(message.get(\"content\"), list):\n",
        "            for content_item in message[\"content\"]:\n",
        "                if content_item.get(\"type\") == \"tool_use\":\n",
        "                    tool_id = content_item.get(\"id\")\n",
        "                    tool_call = {\n",
        "                        \"tool\": content_item.get(\"name\"),\n",
        "                        \"input\": content_item.get(\"input\", {}),\n",
        "                        # 'output': None  # Will be filled in second pass\n",
        "                    }\n",
        "                    tool_calls_map[tool_id] = tool_call\n",
        "                    function_calls.append(tool_call)\n",
        "\n",
        "    # Second pass: match tool results with tool calls\n",
        "    # for message in data:\n",
        "    #     if isinstance(message.get('content'), list):\n",
        "    #         for content_item in message['content']:\n",
        "    #             if content_item.get('type') == 'tool_result':\n",
        "    #                 tool_id = content_item.get('tool_use_id')\n",
        "    #                 if tool_id in tool_calls_map:\n",
        "    #                     # Extract output from tool result\n",
        "    #                     result_content = content_item.get('content', [])\n",
        "    #                     output = {}\n",
        "\n",
        "    #                     # Parse the result content\n",
        "    #                     for result_item in result_content:\n",
        "    #                         if result_item.get('type') == 'text':\n",
        "    #                             text = result_item.get('text', '')\n",
        "\n",
        "    #                             # Try to parse JSON from the text if it looks like JSON\n",
        "    #                             if text.startswith('{') and text.endswith('}'):\n",
        "    #                                 try:\n",
        "    #                                     output = json.loads(text)\n",
        "    #                                 except json.JSONDecodeError:\n",
        "    #                                     output = {'text': text}\n",
        "    #                             else:\n",
        "    #                                 # If not JSON, store as text\n",
        "    #                                 if 'text' not in output:\n",
        "    #                                     output['text'] = text\n",
        "    #                                 else:\n",
        "    #                                     output['text'] += '\\n' + text\n",
        "\n",
        "    #                     # Update the tool call with output\n",
        "    #                     tool_calls_map[tool_id]['output'] = output\n",
        "\n",
        "    return function_calls\n",
        "\n",
        "\n",
        "if not OVERRIDE_TOOLS:\n",
        "    # Transform only original data, not overridden tools\n",
        "    tool_calls_df[\"tool_call\"] = tool_calls_df[\"output_messages\"].apply(get_tool_calls)\n",
        "    tool_calls_df.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A94A0vKVCkro"
      },
      "source": [
        "## Evaluation\n",
        "\n",
        "Run LLM template to evaluate each conversation. Check if the tool usage was correct."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "referenced_widgets": [
            "d3225651a50f45e4a1f8e5bcbfd225aa",
            "b5a6d7213b8744bf80fd5bdbe3c9a045",
            "0ac303456f80441cbad8c522c4993ec9",
            "2b097bbd7eb241d591120d1c1d950ca0",
            "89ab7e9a613c48ea947399b8dcf63ed0",
            "fb731c4bad1c438eb4f4504b63266e79",
            "c69b3894c13547d59f8feecb60bea4a4",
            "0c90ab92e8e14757862e4d62380578e1",
            "f478b098309c47d7b73251f03354ce70",
            "b5cfbf557ac845d398f06aefcc054d5c",
            "e6ac19d06ad646cebcdbe20a65d22d3a"
          ]
        },
        "id": "0QywbpHq-DSn",
        "outputId": "5ed733af-2a93-45d0-d0ca-91cc5514bd79"
      },
      "outputs": [],
      "source": [
        "TOOL_CALLING_BASE_TEMPLATE = \"\"\"\n",
        "You are an evaluation assistant evaluating questions and tool calls to\n",
        "determine whether the tool called would answer the question. The tool\n",
        "calls have been generated by a separate agent, and chosen from the list of\n",
        "tools provided below. It is your job to decide whether that agent chose\n",
        "the right tool to call.\n",
        "\n",
        "    [BEGIN DATA]\n",
        "    ************\n",
        "    [Question]: {question}\n",
        "    ************\n",
        "    [Tool Called]: {tool_call}\n",
        "    [END DATA]\n",
        "\n",
        "Your response must be single word, either \"correct\" or \"incorrect\",\n",
        "and should not contain any text or characters aside from that word.\n",
        "\"incorrect\" means that the chosen tool would not answer the question,\n",
        "the tool includes information that is not presented in the question,\n",
        "or that the tool signature includes parameter values that don't match\n",
        "the formats specified in the tool signatures below.\n",
        "\n",
        "\"correct\" means the correct tool call was chosen, the correct parameters\n",
        "were extracted from the question, the tool call generated is runnable and correct,\n",
        "and that no outside information not present in the question was used\n",
        "in the generated question.\n",
        "\n",
        "[Tool Definitions]: {tool_definitions}\n",
        "\"\"\"\n",
        "\n",
        "from phoenix.evals import OpenAIModel, llm_classify\n",
        "\n",
        "rails = [\"incorrect\", \"correct\"]\n",
        "\n",
        "\n",
        "tool_call_eval = llm_classify(\n",
        "    data=tool_calls_df,\n",
        "    template=TOOL_CALLING_BASE_TEMPLATE,\n",
        "    rails=[\"correct\", \"incorrect\"],\n",
        "    model=eval_model,\n",
        "    provide_explanation=True,\n",
        ")\n",
        "\n",
        "tool_call_eval[\"score\"] = tool_call_eval.apply(\n",
        "    lambda x: 1 if x[\"label\"] == \"correct\" else 0, axis=1\n",
        ")\n",
        "\n",
        "tool_call_eval.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Fr19MqWrC3Sw"
      },
      "source": [
        "## Push evaluation results back to Phoenix\n",
        "\n",
        "In Phoenix UI will be visible results of evaluation"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "aw-FgQ3g-DSn"
      },
      "outputs": [],
      "source": [
        "px_client.log_evaluations(\n",
        "    SpanEvaluations(eval_name=\"Tool Calling Eval (JS)\", dataframe=tool_call_eval),\n",
        ")"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "K141mQmECXKO",
        "A94A0vKVCkro",
        "Fr19MqWrC3Sw"
      ],
      "provenance": []
    },
    "kernelspec": {
      "display_name": ".venv",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.3"
    },
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "07e9eb288b694f0abd9c41f714887421": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "FloatProgressModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_ba869a403b65473a8001ce26d80b7c3b",
            "max": 64,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_30cc1a2593264ae48efac88e033f8bb2",
            "value": 64
          }
        },
        "0ac303456f80441cbad8c522c4993ec9": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "FloatProgressModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_0c90ab92e8e14757862e4d62380578e1",
            "max": 6,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_f478b098309c47d7b73251f03354ce70",
            "value": 0
          }
        },
        "0c90ab92e8e14757862e4d62380578e1": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "1a4c748eced043a8be3ffa2badfd2aab": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "1bb2b61dfded427a94c6f1d45fd7ef0f": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "23e16949a2914a2ab7a789d336c74b70": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "2b097bbd7eb241d591120d1c1d950ca0": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_b5cfbf557ac845d398f06aefcc054d5c",
            "placeholder": "​",
            "style": "IPY_MODEL_e6ac19d06ad646cebcdbe20a65d22d3a",
            "value": " 0/6 (0.0%) | ⏳ 00:04&lt;? | ?it/s"
          }
        },
        "2db670aeb3554bd6b1ce20e9b6519d1e": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "2e23bca9307747fab7eb905c7dfc2e80": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "2e6cb507b3e141349773886202044fff": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_c6b285dbe3cc4145997136c06e8dd68a",
            "placeholder": "​",
            "style": "IPY_MODEL_7bcecd0687964676a248c53514d0430b",
            "value": "running experiment evaluations "
          }
        },
        "2fb3300a9c774fac8ce8d738862b80fa": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "30cc1a2593264ae48efac88e033f8bb2": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "ProgressStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "4701b6567eed48ed99fa38a650b8f66c": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_fa690f3ac0b14454bf08c74dd8e4de08",
            "placeholder": "​",
            "style": "IPY_MODEL_d4bf999940954bceafa9bdc17497eaac",
            "value": "running experiment evaluations "
          }
        },
        "4c0fca2991d348c6b81757987d8cf7d2": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "FloatProgressModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_e03222e0e51848efbdd36b04f838c4e7",
            "max": 64,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_fbb6ca7b528f42f08325a57f2ba76581",
            "value": 63
          }
        },
        "4dec52b6f65f4ad997d549dcb652545f": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HBoxModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_5c196e9311bb4170b8764f94eb766868",
              "IPY_MODEL_4c0fca2991d348c6b81757987d8cf7d2",
              "IPY_MODEL_7b206c4a135d4f6e8035d9121da7c13c"
            ],
            "layout": "IPY_MODEL_8c49525a022549049edcd2d98468753c"
          }
        },
        "5c196e9311bb4170b8764f94eb766868": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_c70d88762a194a7a829a034da6940bfe",
            "placeholder": "​",
            "style": "IPY_MODEL_2db670aeb3554bd6b1ce20e9b6519d1e",
            "value": "running tasks "
          }
        },
        "66e38b9481954fdc9dc5bcae47d9e035": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "ProgressStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "68d978576a7c48bdb43c3cb09665ba86": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "6ce730fe5d2f442ea877b0b617e62987": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "FloatProgressModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_cae08e89b7a246df964b8921f11efd29",
            "max": 64,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_66e38b9481954fdc9dc5bcae47d9e035",
            "value": 64
          }
        },
        "724a78b102c34d0c905da8a468ae2288": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "73a3150b6d97444e862bb27742aa7a8b": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "751f78b8e1f041bb915f66bcd8a1b06b": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HBoxModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_f9fa3c13ec9f41c48e79316150706e50",
              "IPY_MODEL_e9b6c8ac94344ca384e2313536c530a6",
              "IPY_MODEL_9a3cdb4064f444a3ba1c3ffe17f45259"
            ],
            "layout": "IPY_MODEL_2fb3300a9c774fac8ce8d738862b80fa"
          }
        },
        "7b206c4a135d4f6e8035d9121da7c13c": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_724a78b102c34d0c905da8a468ae2288",
            "placeholder": "​",
            "style": "IPY_MODEL_68d978576a7c48bdb43c3cb09665ba86",
            "value": " 63/64 (98.4%) | ⏳ 03:13&lt;00:02 |  2.46s/it"
          }
        },
        "7ba0ef6eb185420f922975eab019b2d8": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "7bcecd0687964676a248c53514d0430b": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "89ab7e9a613c48ea947399b8dcf63ed0": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "8c49525a022549049edcd2d98468753c": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "9a3cdb4064f444a3ba1c3ffe17f45259": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_1a4c748eced043a8be3ffa2badfd2aab",
            "placeholder": "​",
            "style": "IPY_MODEL_23e16949a2914a2ab7a789d336c74b70",
            "value": " 64/64 (100.0%) | ⏳ 23:11&lt;00:00 |  1.04s/it"
          }
        },
        "9d0d402b9d5847e088e23127f860ef4c": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "a68420722733427fa6d6c6744e73c8e2": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "ProgressStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "a98eb9ccdc1e4dd6bf753790ad870195": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "ade52643f8984b71ba30975ad50f9fd2": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "b4331d4ab91c42988f0dd3cdb5891a3d": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_7ba0ef6eb185420f922975eab019b2d8",
            "placeholder": "​",
            "style": "IPY_MODEL_a98eb9ccdc1e4dd6bf753790ad870195",
            "value": " 64/64 (100.0%) | ⏳ 00:07&lt;00:00 |  7.62it/s"
          }
        },
        "b5a6d7213b8744bf80fd5bdbe3c9a045": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_fb731c4bad1c438eb4f4504b63266e79",
            "placeholder": "​",
            "style": "IPY_MODEL_c69b3894c13547d59f8feecb60bea4a4",
            "value": "llm_classify "
          }
        },
        "b5cfbf557ac845d398f06aefcc054d5c": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "ba869a403b65473a8001ce26d80b7c3b": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "c69b3894c13547d59f8feecb60bea4a4": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "c6b285dbe3cc4145997136c06e8dd68a": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "c70d88762a194a7a829a034da6940bfe": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "cae08e89b7a246df964b8921f11efd29": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "d1a6ece58fa44399a5cfdc58d60b488f": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HBoxModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_4701b6567eed48ed99fa38a650b8f66c",
              "IPY_MODEL_6ce730fe5d2f442ea877b0b617e62987",
              "IPY_MODEL_ded2b5f843504f1eb420168d6c4de2f2"
            ],
            "layout": "IPY_MODEL_da57d261dd3a43e9bb1e23f412a2bd55"
          }
        },
        "d3225651a50f45e4a1f8e5bcbfd225aa": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HBoxModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_b5a6d7213b8744bf80fd5bdbe3c9a045",
              "IPY_MODEL_0ac303456f80441cbad8c522c4993ec9",
              "IPY_MODEL_2b097bbd7eb241d591120d1c1d950ca0"
            ],
            "layout": "IPY_MODEL_89ab7e9a613c48ea947399b8dcf63ed0"
          }
        },
        "d4bf999940954bceafa9bdc17497eaac": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "da57d261dd3a43e9bb1e23f412a2bd55": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "ded2b5f843504f1eb420168d6c4de2f2": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_1bb2b61dfded427a94c6f1d45fd7ef0f",
            "placeholder": "​",
            "style": "IPY_MODEL_e8b56c93cbdd4507aceb37ffe6dd5992",
            "value": " 64/64 (100.0%) | ⏳ 00:46&lt;00:00 |  7.29it/s"
          }
        },
        "e03222e0e51848efbdd36b04f838c4e7": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "e6ac19d06ad646cebcdbe20a65d22d3a": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "e8b56c93cbdd4507aceb37ffe6dd5992": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "e9b6c8ac94344ca384e2313536c530a6": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "FloatProgressModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_73a3150b6d97444e862bb27742aa7a8b",
            "max": 64,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_a68420722733427fa6d6c6744e73c8e2",
            "value": 64
          }
        },
        "f478b098309c47d7b73251f03354ce70": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "ProgressStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "f686798eec194436a7c4966ba7302f14": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HBoxModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_2e6cb507b3e141349773886202044fff",
              "IPY_MODEL_07e9eb288b694f0abd9c41f714887421",
              "IPY_MODEL_b4331d4ab91c42988f0dd3cdb5891a3d"
            ],
            "layout": "IPY_MODEL_ade52643f8984b71ba30975ad50f9fd2"
          }
        },
        "f9fa3c13ec9f41c48e79316150706e50": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_9d0d402b9d5847e088e23127f860ef4c",
            "placeholder": "​",
            "style": "IPY_MODEL_2e23bca9307747fab7eb905c7dfc2e80",
            "value": "running tasks "
          }
        },
        "fa690f3ac0b14454bf08c74dd8e4de08": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "fb731c4bad1c438eb4f4504b63266e79": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "fbb6ca7b528f42f08325a57f2ba76581": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "ProgressStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        }
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
